Exploring Features for Text-dependent Speaker Verification in Distant Speech Signals

نویسنده

  • B. Yegnanarayana
چکیده

Automatic speaker verification (ASV) is the task of verifying a person’s claimed identity from his/her voice using a digital computer. The existing ASV systems perform with high accuracy of verification when the speech signal is collected close to the mouth of the speaker (< 1 ft). However, the performance of the ASV systems reduces significantly for speech signals collected at a distance from the speaker (2-6 ft). The objective of this work is to address some research issues in the processing of speech signals collected at a distance from the speaker, for text-dependent ASV system. The distant speech signal is collected using single channel microphone. An acoustic feature derived from short segments of speech signals is proposed for ASV task. The key idea is to exploit the high signal-to-noise nature of short segments of speech in the vicinity of impulse-like excitations. We demonstrate that the proposed feature suffers lesser degradation with distance when compared to the widely used Mel-frequency cepstral coefficients (MFCCs), and also yields better performance of speaker verification than MFCCs. We propose a method of begin-end detection based on the strength of the spectral peaks. A score normalization method is proposed by considering only the robust regions of speech signal. In addition, the regions of speech signal with high signal-to-reverberation ratio are identified, and greater weightage is given to these regions. These modifications are shown to result in a systematic improvement in the performance of the speaker verification system. The use of additional features of duration and pitch is shown to further improve the performance of speaker verification system for distant speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals

Existing automatic speaker verification (ASV) systems perform with high accuracy when the speech signal is collected close to the mouth of the speaker (< 1 ft). However, the performance of these systems reduces significantly when speech signals are collected at a distance from the speaker (2-6 ft). The objective of this paper is to address some issues in the processing of speech signals collect...

متن کامل

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent spe...

متن کامل

Deep feature for text-dependent speaker verification

Recently deep learning has been successfully used in speech recognition, however it has not been carefully explored and widely accepted for speaker verification. To incorporate deep learning into speaker verification, this paper proposes novel approaches of extracting and using features from deep learning models for text-dependent speaker verification. In contrast to the traditional short-term ...

متن کامل

Tandem deep features for text-dependent speaker verification

Although deep learning has been successfully used in acoustic modeling of speech recognition, it has not been thoroughly investigated and widely accepted for speaker verification. This paper describes an investigation of using various types of deep features in a Tandem fashion for text-dependent speaker verification. Three types of networks are used to extract deep features: restricted Boltzman...

متن کامل

Deep Speaker Vectors for Semi Text-independent Speaker Verification

Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification. This new method has been tested on text-dependent speaker verification tasks, and improvement was reported when combined with the conventional i-vector method. This paper extends the d-vector approach to sem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010